Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism

نویسندگان

  • Shandar Ahmad
  • Philip Prathipati
  • Lokesh P Tripathi
  • Yi-An Chen
  • Ajay Arya
  • Yoichi Murakami
  • Kenji Mizuguchi
چکیده

DNA-binding proteins (DBPs) perform diverse biological functions ranging from transcription to pathogen sensing. Machine learning methods can not only identify DBPs de novo but also provide insights into their DNA-recognition dynamics. However, it remains unclear whether available methods that can accurately predict DNA-binding sites in known DBPs can also identify novel DBPs. Moreover, sequence information is blind to the cellular- and disease-specific contexts of DBP activities, whereas the under-utilized knowledge from public gene expression data offers great promise. To address these issues, we have developed novel methods for predicting DBPs by integrating sequence and gene expression-derived features and applied them to explore human, mouse and Arabidopsis proteomes. While our sequence-based models outperformed the gene expression-based ones, some proteins with weaker DBP-like sequence features were correctly predicted by gene expression-based features, suggesting that these proteins acquire a tangible DBP functionality in a conducive gene expression environment. Analysis of motif enrichment among the co-expressed genes of top 100 candidates DBPs from hitherto unannotated genes provides further avenues to explore their functional associations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gamma reactivation using the spongy effect of KLF1-binding site sequence: an approach in gene therapy for beta-thalassemia

Objective(s): β-thalassemia is one of the most common genetic disorders in the world. As one of the promising treatment strategies, fetal hemoglobin (Hb F) can be induced. The present study was an attempt to reactivate the γ-globin gene by introducing a gene construct containing KLF1 binding sites to the K562 cell line. Materials and Methods: A plasmid containing a 192 bp sequence with two repe...

متن کامل

Comparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species

Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...

متن کامل

Post-translational changes of histones, methylation level, and ERβ protein level in the cumulus cell genome of infertile women with endometriosis

Background: Endometriosis (which affects up to 50% of infertile women) is one of the major causes impacting female infertility. Endometriosis, defined as the presence of endometrial glands and stroma outside the uterine tissue, causes a wide range of functional disorders in the process of follicular development and changes in the follicular milieu, resulting in the formation of an incompetent o...

متن کامل

IDENTIFICATION, ISOLATION, CLONING AND SEQUENCING APARTIALANNEXIN GENE FROM AUREOBASIDIUM PULLULANS

Background and Objectives: Annexin is the common name for genes and proteins that were identified as calcium-dependent phospholipid-binding proteins. Recently a more complex set of functions has been recognized for this superfamily of proteins including in vesicle trafficking, cell division, apoptosis, calcium signalling, mineralization, crystal nucleation inside the extracellular organelle...

متن کامل

Molecular detection of proteolytic activity of human parechovirus 2A protein by gene expression

  Parechoviruses form one of the nine genera in the picornaviridae family, and include two human pathogens: Human parechovirus type1 and 2 (Hpev1 and Hpev2). The genome of picornaviruses encodes a single polyprotein, which undergoes a cleavage cascade performed by virus encoded proteases to give the final virus proteins. The primary cleavage occurs by 2A protein and this step is critical for vi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 46  شماره 

صفحات  -

تاریخ انتشار 2018